Reading � Curious machines

Greg Detre

Monday, April 21, 2003

Schaal, S (1999). Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences 3:233-242.

learning from imitation

efficient motor learning

connection between action and perception

modular motor control in the form of motor primitives

development of humanoid robots

Honda humanoid robot

�As it is impossible to search such huge spaces for what constitutes a good action, it is necessary to either find more compact state-action representations, or to focus learning on those parts of the state-action space that are actually relevant for the movement task at hand. In the following article, we will review how the latter topic can be approached in the framework of imitation learning, while the former topic, i.e., compact state-action representations, will be shown to be a natural prerequisite for imitation learning in the form of movement primitives�

teleoperation???

how might you represent state-action pairs more compactly???

how do movement primitives help??? is it because they�re aggregations of state-action pairs???

you need a goal to imitate the right aspects of an action, and you really need language to communicate goals�

how do you combine imitative with reinforcement learning???

task strategy vs task goal???

I suppose the strategy is your (relatively high-level) means of going about things, and your goal is the end-point or abstract description of the task that you�re trying to achieve�

it will never be the �same� goal � always a mapped goal (i.e. swapping �him� for �me� etc.)

that�s a hard problem in itself

rough definitions: accommodation is reuse, assimilation is recognition

are these the Piagetian definitions???

are the rest of them standard/agreed definitions???

are �control policy� and �movement primitive� synonymous??? is a CP a series of MPs???

�In infant and animal studies, the ability to imitate is usually concluded from the subject�s increased tendency to execute a previously demonstrated behavior. However, other causes can equally account for a higher probability of the subject�s behavior, in particular priming, emulation, and response facilitation (Glossary); such causes are not to be mistaken with true imitation (8, 9). True imitation is present only if i) the imitated behavior is new for the imitator, ii) the same task strategy as that of the demonstrator is employed, and iii) the same task goal is accomplished�

why does imitation add more than supervised learning to the learning of communication??? (pg 5)

how does the F5 homologue to Broca�s area thing work??? pg 7

important distinction made between program- and action-level imitation

�Action Level Imitation: The indiscriminate copying of the actions of the teacher without mapping them onto more abstract motor representation.

Program Level Imitation: A process by which the structural organization of a behavior is copied from observing a teacher, while the exact details of actions are filled in by individual learning.�

but I don�t understand task-level learning:

�Task Level Learning: Learning of a task can take place by learning an appropriate Control Policy that generates commands u on the actuator level, or by learning a Control Policy that generates commands in a more abstract but task related space, e.g., the space of the finger tip. The latter approach is called task-level learning and it requires additional transformations to map the task-level command into actuator space. Usually, errors in performance are more associated with task commands than actuator commands.�

important difficult definitions:

�Control Policy: A function that maps the state x of a movement system and its environment into an appropriate action u for a particular task, i.e., u x = p(x, t, a). As indicated, the function π can directly depend on the time, t, and some additional parameters α that may be useful to adjust the policy for a particular task goal. Movement Primitives can be formalized in the form of control policies.�

�Movement Primitive: A sequence of actions that can accomplish a certain movement goal. See Control Policy for a more formal definition.�

movement primitives (pg 9)

sequences of action that accomplish a complete goal-directed behaviour

can be as simple as an elementary action in the symbolic approaches to imitation, e.g. go forward

do not scale well with many DoFs

interesting result:

connection from somatosensory cortex to the superior temporal sulcus (STs) in macaques � most of the form and motion neurons were insensitive to self-motion due to re-afferent signals

what else is required for full imitation than is provided by the mirror neurons??? pg 7

�some neurons in F5, called �mirror neurons�, were active both when the monkey observed a specific behavior and when it executed it itself (37). Mirror neurons fire highly specifically only to a special motor behavior with a particular object. These results are similar to those in STs (28), with the difference that neurons in STs do not respond to executed motor acts, but rather only to perceived ones.�

what�s the human analogue of STs (macaques)???

what�s the likely human analogue of the STs�7b�F5 macaque imitation pathway???

is there any reason to believe there�s just one??? more than one???

I suppose we�re only talking about visually mediated motor movements� they�ve specifically ruled out verbally mediated�

might the pathway be slightly different for imitating non-humans??? after all, since the mirror neurons are specific to humans, and certain body parts and certain actions, it�s quite possible that there will be one pathway for imitating some (recognised) actions (or whose goal is known) and other pathways for purely action-level imitation, say, right???

early symbolic approaches to imitation learning

state-action-state sequence was converted into if-then rules

this sort of FSM (???) is doomed because it will suffer from combinatorial explosion

difficulty synthesising new movements??? difficulty seeing analogous movements???

MPs � code complete temporal behaviours, like �grasping a cup�, �walking�, �a tennis serve� � compact state-action representation where only a few parameters need to be adjusted for a specific goal

so the hard problem underlying an imitation system is building up a set of useful movement primitives

presumably you also need a system for generating new MPs, and noticing when to use them and when not to

�the perceived action of the teacher is mapped onto a set of existing primitives in an assimilation phase�

I like the idea that once you�ve decided what you need to do, and stored a high-level description of the action you�re trying to achieve, then you can use supervised learning to improve your actual motor performance on this task (this is also mentioned somewhere above)

should there be a continuum between task strategy and task goal � presumably the task strategy will be expressed at a higher and higher level as you get older, perhaps even in terms of the task goals you worked towards when you were younger

or are task goals some different category of representations???

do mirror neurons notice motor primitives then???

are there some motor primitives at a higher level???

in fact, is there something lower-level than a motor primitive???

surely

what would you call it then???

can you have imitation without any reinforcement signal???

surely

is there something qualitatively different about imitating something (right) after just one viewing???

presumably the imitative abilities (and their representations) and motor control develop simultaneously, right???

model-based learning??? predictive feedforward model???

presumably you use supervised learning to build a supervised model of what motor commands led to what change of state, then when you want to produce some specific state, you can feed back to what motor command to use???

can you feed the information backwards like this through a feedforward net???

I think so

�Interesting insights into these methods [for learning novel behaviours by imitation] can be gained by analyzing the process of how a perceived behavior is mapped onto a set of existing primitives. Two major questions become a) what is the matching criterion for recognizing a behavior, and b) in which coordinate frame does matching take place?�

via-point method???

presumably this is setting up waypoints in the teacher�s action that you try and reproduce???

y, and they can be used for classification (and presumably evaluation of success) too

various issues with translation, scale and rotation invariance

�the suggested bidirectional interaction between perception and action is noteworthy�???

bidrectional interaction of generative and recognition models in unsupervised learning

movement recognition is based on the movement generation system

�movement recognition based on forward models integrates smoothly with the simulation theory of mind�

see Meltzoff and Moore�s �Active Intramodal Matching�

I like the idea of forward models put into multiple-model competition � but you do need some means of deciding between them

�imitation learning could be conceived of as a research strategy that channels investigations in computational motor control towards the important topic of action-perception coupling�

is �tweening� the right term for an arbitrary trajectory???

do the feedforward models, via-point method and splining all do roughly the same thing???

I reckon the feedforward models (especially if they�re in multiple competition) are more general (and heterogenous) in the types of actions they can represent

is the splining used to interpolate between the via-points???

Schaal S, Ijspeert A, Billard A (2003) Computational approaches to motor learning by imitation. Philosophical Transaction of the Royal Society of London: Series B, Biological Sciences 358: 537-547

I think a task-specific control policy chooses between the motor primitives�???

hmmm, maybe

or maybe they�re synonymous

I didn�t understand the need for or the distinction between movement planning and execution??? (pg 6)

endeffector???

splines as nonautonomous, �since the viapoints defining the splines are parameterised explicitly in time�???

not robust in coping with unforeseen perturbations of the movement